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Abstract: We focus on one-sided, mixture-based stopping rules for the problem of sequential testing a simple null 
hypothesis against a composite alternative. For the latter, we consider two cases — either a discrete alternative or 
a continuous alternative that can be embedded into an exponential family. For each case, we find a mixture-based 
stopping rule that is nearly minimax in the sense of minimizing the maximal Kullback-Leibler information. The 
proof of this result is based on finding an almost Bayes rule for an appropriate sequential decision problem and on 
high-order asymptotic approximations for the performance characteristics of arbitrary mixture-based stopping times. 
We also evaluate the asymptotic performance loss of certain intuitive mixture rules and verify the accuracy of our 
asymptotic approximations with simulation experiments. 

Keywords: Asymptotic optimality; Minimax tests; Mixtures rules; One-sided sequential tests; Open-ended tests; 
Power one tests. 

Subject Classifications: 62L10; 62L15; 60G40. 
1. INTRODUCTION 

1.1. Problem Formulation and Literature Review 

Let {Xn}neN be a sequence of independent and identically distributed (iid) observations (generally vectors, 
Xn G W^) whose common distribution under the probability measure Pq (the null hypothesis Hq : P = Pq) 
is Fq. There is no cost for sampling under Pq. However sampling should be terminated as soon as possible 
if there is sufficient evidence against Pq and in favor of a class of probability measures V (an alternative 
hypothesis H : P G V). The problem is to find an {^„}-stopping time that takes lai^ge values under Pq 
and small values under every probability measure in V, where = <7(^i, • • • , Xn) is the sigma-algebra 
generated by the first n observations Xi, . . . , Xn, n> 1. 

When V consists of a single probability measure, say V = {Pi}, and the Pi -distribution of Xi, Fi, is 
absolutely continuous with respect to Fq, a definitive solution to this sequential hypothesis testing problem 
is the one-sided Sequential Probability Ratio Test (SPRT) 

T\ = inf{n > 1 : > A}, inf{0} = oo, 

where A > 1 is a fixed level (threshold) and {A^} is the corresponding likelihood-ratio process, i.e., 

" dF 
m=l 
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The stopping time T\ is often called an open-ended test or a test of power one, because it does not terminate 
almost surely under Pq iPo{T\ < oo) < 1/A), whereas it terminates almost surely under V, i.e., Pi{T\ < 
oo) = 1. Furthermore, it follows from Chow et al. (1971, pp. 107-108) that if the threshold A = Aa is 
selected so that Po{T\ < oo) = a, then 

Ei[rl]= inf Ei[r], (1.1) 

where Ei denotes expectation with respect to Pi and Ca = {T : Po{T < oo) < a} is the class of stopping 
times whose "error probability" is bounded by a, < a < 1. 

When the alternative hypothesis is not simple, there have been extensions of the one-sided SPRT, but 
none of them exhibits such an exact optimality property as ( 1 . 1 ) under every probability measure associated 
with the alternative hypothesis V. More specifically, suppose that V = {Pe}eee\{o} ^^^^ the Pg- 
distribution of Xi belongs to the exponential family 

dFe{x) _ /) c ft - /fl c ■ FJJ^ 



dFo{x) 



gto-^fl^ G e = {0 G M : EoK^^i] < oo}, (1.2) 



where ^51 = log Eo[e^"^^]. Moreover, let be the likelihood ratio of Pg versus Pq based on the first n 
observations, i.e., 

k=i °^ k=i 

and let Ig = Eg [log A^] denote the Kullback-Leibler divergence of Fq versus Fq, where here and in what 
follows Eq stands for expectation with respect to Pq. 

A natural generalization of the one-sided SPRT is the threshold stopping time inf{n > 1 : A^" > A}, 
where 6n is an estimate of the unknown pai^ameter 9 at time n. Lorden (1973) followed a generalized 
likelihood ratio approach, where 6'„ is taken to be the maximum likelihood estimator (MLE) of based on 
the first n observations (see also Lai (2001) for two composite hypotheses and two-sided tests). Robbins and 
Siegmund (1970, 1974) followed a non-anticipating estimation approach and considered 6n to be a one-step 
delayed estimator that depends only on the first n — 1 observations. For the latter approach, we also refer to 
Pollak and Yakir (1999), Pavlov (1990), Dragalin and Novikov (1999), and Lorden and Pollak (2005). 

An alternative, mixture-based approach was used by Darling and Robbins (1968) (see also Robbins 
(1970)), where the stopping rule has the form 

Ta = inf{n >l:An>A} (1.4) 

with {A„} being a weighted (mixed) likelihood-ratio statistic given by 

A„ = / A^ Gide) , n G N (1.5) 

and G being an arbitrary distribution function on @. Assuming that G has a positive and continuous density 
with respect to the Lebesgue measure, Pollak and Siegmund (1975) obtained an asymptotic approximation 
for E6)[T^] as A — )• 00. Based on this approximation, Pollak (1978) proved that if a = l/A and C is 
an arbitrary, closed, finite interval, bounded away from 0, then 



inf sup le Ee[T] > \ log a\ + log v^|loga| + 0(1) as a ^ 0, (1.6) 



where 0(1) is bounded as a — )• 0, and that this asymptotic lower bound is attained by any mixture rule 
whose mixing distribution has a positive and continuous density with support that includes 0. Note that 
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/6iEe[r] = Eg [log A^] is the total Kullback-Leibler information in the trajectory Xf = {Xi, . . . , X^) in 
favor of the hypothesis Hg : P = Pq versus Hq : P = Pq, so that the problem of minimizing of the maximal 
value of Ig Eg [T] can be interpreted as minimizing the Kullback-Leibler information in the least favorable 
situation. 

Lerche (1986) considered the problem of sequential testing for the drift of a Brownian motion in a 
Bayesian setup. 

1.2. Main Contributions 

One of the goals of this work is to extend the above work on mixture rules. In the framework of exponential 
families, we show that a particular- choice of the mixing density leads to a mixture rule Ta that attains 
infTeCa supggQ {Ig E.g[T]), not only up to an 0(1) term as in PoUak (1978), but up to an o(l) term (see 
Theorem 3.1). 

However, the main emphasis is on the case that the alternative hypothesis "P is a finite set, V = 
{Pi, ... , Pa'}- In this setup, the weighted likelihood ratio statistic becomes 



where = nm=i i'^^ii-^m.) / dFo^X^)], Fi is the Pj -distribution of Xi, which is assumed to be absolutely 
continuous with respect to Fq, and {pi} is a probability mass function, i.e., > for every i and Xlili Pi — 
1. This is a more general framework than that of an exponential family, in that the distributions Fi and Fq are 
not required to belong to the same (exponential) parametric family. Moreover, it can be seen as a discrete 
approximation to the continuous setup (1.2). Such an approximation is necessary in practice, since the 
continuously weighted likelihood ratio (1.5) is not usually implementable without such a discretization. 

However, the main motivation for the discrete setup is that it arises naturally in many applications. 
Consider, for example, the so-called L-sample slippage problem, where there are L sources of observations 
("channels" or "populations") and there are two possibilities for the distribution of each source {in and 
out of control). This problem has a variety of important applications, in particular in cybersecurity (see 
Tartakovsky et al. (2006a,b)) and in target detection (see Tartakovsky and Veeravalli (2004); Tartakovsky et 
al. (2003)). 

Our main contribution in the discrete setup is that we find a mixing distribution {p?} which makes the 
corresponding mixture test nearly minimax in the sense that it attains mix^Ca maxj (Ij Ej[T]) up to an o(l) 
term as a — )• 0, where Ij is the Kullback-Leibler distance between Fi and Fq (see Theorem 2.2). The main 
components of the proof are finding a nearly Bayes rule for a decision problem with non-homogeneous 
sampling costs in V and obtaining a high-order asymptotic expansion for Ej[T^] up to an o(l) term as well 
as an asymptotic approximation for the "error probability" Po{Ta < oo) as A — >■ oo. 

1.3. Misspeciflcation and the Appropriate Minimax Criterion 

As we will see, the expansion for Ej[r^] remains valid even when pi = 0, as long as certain additional 
conditions are satisfied (see (2.1)). That is, we allow the number of active components, K = i^{pi : 
Pi 7^ 0}, of an arbitrary mixture rule to be smaller than K. It is useful to incorporate this case in our 
analysis, since the "true" distribution may not be included in V. For example, in the slippage problem, the 
actual number of out-of-control channels is typically not known in advance. Thus, the cardinality of V is 
K = YlLi (t) =2^-1. However, if a desi gner assumes that only one channel can be out-of-control, 
which is the hardest case to detect, the resulting mixture rule will assign a positive weight to only L of the 
K probabiUty measures in V, so that K = L < K. Another case where such a misspeciflcation arises 



K 




(1.7) 
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naturally is when approximating a continuous alternative hypothesis with a discrete set of points. Then, it is 
useful to evaluate the performance of the discrete mixture rule also between the points that were used for its 
design. 

Finally, allowing some components of the mixing distribution to be helps to explain why we chose to 
design a sequential test that attains asymptotically infygc^ maxj (/jEj[T]) instead of inf^gc^ maxj Ej[T], 
which would be the straightforward minimax criterion. Indeed, in Subsection 2.6 we will see that when 
the Kullback-Leibler numbers {/j} are not identical, the latter criterion cannot be attained asymptotically, 
not even up to a first order, by a mixture rule that gives positive weights to all of its components. Thus, 
minimizing the maximal expected sample size is an inappropriate criterion, since it dictates the use of a 
sequential test, T*, that will not even be uniformly first-order asymptotically optimal, i.e., the ratio Ej[T*]/ 
infreCc will not converge to 1 as a — for every \ < i < K . 

On the other hand, the criterion inix^Ca maxj (/^ Ej [T] ) leads to a non-tiivial mixture test with pi > for 
every 1 < i < K, which (just like any other fully-supported mixture rule) attains inf^gc^ Ej [T] as a — )• 
up to a constant for every 1 < i < K. Moreover, it is a natural minimax criterion since, as we already 
mentioned above, max^ (/j Ej [T] ) = maxj Ej [log A^] is the maximum Kullback-Leibler distance between 
V and Pq based on the observations up to time T. Thus, this criterion provides a natural and meaningful 
way to express the minimax property and select a particular mixture rule for our problem. 

1.4. Anscombe's Condition and Nonlinear Renewal Theory 

We would like at this point to highlight the connection of our work with the celebrated paper of Anscombe 
(1952), where he insightfully introduced the notion of uniform continuity in probability and showed that it 
constitutes a sufficient condition for preserving convergence in distribution when using random times. More 
specifically, Anscombe called a sequence {^n} uniformly continuous in probability (u.c.i.p), if for every 
e > there exists a 5 > such that 



Moreover, he proved that if a u.c.i.p. sequence converges in distribution to a random variable ^ as n — )• 
oo and {tc] is a family of positive integer- valued random variables such that tc/c converges in probability 
as c — )• oo to a finite limit, then {^t^} also converges to ^ as c — cxo. This theorem has had a profound 
impact on the field of Sequential Analysis, since it provided the basis for developing Central Limit Theorems 
(CLTs) for stopped random walks and families of stopping times. However, the notion of uniform continuity 
in probability plays an important role in a much wider range of sequential problems, including the one we 
consider in this paper. The reason is its deep connection with nonlinear renewal theory, which is the main 
tool that we use in order to describe the asymptotic performance of mixture rules. The corresponding 
analysis for continuous mixture rules was done by Pollak and Siegmund (1975) who first used such ideas 
before a general theory was presented by Lai and Siegmund (1977, 1979). 

More specifically, assuming that pi > 0, we can decompose the logarithm of the mixture statistic ( 1 .7) as 
log A„ = log + Y^, where is defined in (2.12) below. The idea then is that the asymptotic distribution 
of the overshoot log(A7^^/^) as A — cx) will be the same as if Y^ was 0, as long as y^, n = 1,2,... 
are "slowly changing" compared to the Pj-random walk {log A* J. This observation leads to an accurate 
approximation for Pq{Ta < oo), and it is also the basis for the high-order expansion of Ej[TA] (for which 
additional integrability and convergence conditions on Y^ are required). 

Nonlinear renewal theory makes the above argument rigorous by formalizing the notion of a "slowly 
changing" sequence. Specifically, is said to be slowly changing, if it is uniformly continuous in prob- 
ability and satisfies the probabilistic growth condition 




for every n E N. 



(1.8) 




Op(n) as n — >• oo, 



(1.9) 
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1. e., maxo<fc<„ \^k\ —^0 in probability. Therefore, uniform continuity in probability is at the core of 
nonlinear renewal theory being the key condition that allows us to understand the behavior of overshoots 
of perturbed random walks, and consequently, a variety of "sequential objects", such as the mixture-based 
sequential tests that we consider in this paper. 

Finally, we should note that using Anscombe's theorem we can establish the asymptotic normality of 
the (standardized) mixture stopping rules {Ta} as yl — oo. Whereas we do not need this property for our 
purposes, it is useful since it justifies using the expectation of T4 in order to quantify its performance. 

1.5. Organization of the Paper 

The rest of the paper is organized as follows. In Section 2, we focus on discrete mixture rules and study 
their asymptotic performance and optimality properties. In Section 3, we consider the case of an exponential 
family with continuous parameter. Section 4 illustrates our findings with simulation experiments in the 
normal case. In Section 5, we discuss ramifications of our work in testing of two hypotheses and in sequential 
change detection, and we conclude in Section 6. 

2. DISCRETE MIXTURE RULES 

In this section we assume that V = {Pi}t=i,...,x and we let {pi} be an arbitrary probability mass function, 
i-e.. Pi > for every i = 1, . . . , K and Pi = 1- 

2.1. Notation and Assumptions 

Let A„ be as defined in (1.7) and let Z„ = logA„. Then the mixture rule (1.4) calls for stopping and 
accepting the hypothesis H : P G P (rejecting the nuU hypothesis Hq : P = Po) at 

Ta = inf{n > 1 : Z„ > log^}, (2.1) 

where Tyi = 00 if there is no such n. For every i = 1, . . . , K, we set 

and we define the one-sided SPRTs 

T'a = inf{n >l:Al>A} = mf{n > 1 : Z; > log (2.3) 

where ^4 > 1 is a fixed threshold. 

For every i,j = I, . . . , K, v/e assume that < ^j\Z\\ < 00, where Ej[-] refers to expectation with 
respect to Pj, and we set 

= E, [Zl] and = E, [Zi -Z\]=I,- E, [Zl], (2.4) 

i.e., Ij (Iji) is the Kullback-Leibler divergence of Fj versus Fq (Fj). Therefore, {Z,^}„>i is a random walk 
under Pj whose increments have mean Ej[Z\] = Ij — Iji. If Ej[Z\\ > 0, or equivalently Ij > Iji, then, by 
renewal theory, the asymptotic distribution of the overshoot r/^ = Zt,^ — log A under Pj is well-defined and 

A 

we denote it as 
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More specifically, T-Lj^i can be defined in terms of the ladder variables of the -random walk {Z^^}. For 
the sake of brevity, we write T-Li = T-L^i for the asymptotic distribution of rjl under Pj, which is always 
well-defined since Ej[Z{] = Ij > 0. 

With a change of measure Pq i— )• P j it can be easily shown that 



APoiT\ <cc) = AEi \l/Af^a{Tx<oo}\ = [^M-riAn {tx<oo} 
where 6i is the Laplace transform of Hi, i.e.. 



5i as ^ — ^ oo, (2.5) 



oo 



e-''ni{dx)= lim Eile-'^A], (2.6) 

A— i-oo 

Note that the quantity Si is also very important when designing the one-sided test T^. More specifically, 
Lorden (1977) showed that if c is the cost of every observation, then the one-sided SPRT T\ with A = 6ili/c 
attains iniT[Po{T < oo) + cEj[T]], where the infimum is taken over all stopping times. 

If Ej [max{0, Z| }^] < oo, then from Wald's identity, (2.4) and renewal theory (Woodroofe (1982, Corol- 
lary 2.2)), we have 

[Ij - Iji] Ej [T\] = log ^ + + o(l) as ^ ^ oo, (2.7) 
where is the average of Tijii, i.e.. 



i-oo 

/ xTiji^idx) = lim Ej[rj\]. (2.8) 

Jq A^oo 



It is a direct consequence of (2.7) that 

E,[TX] =logA + Xi + o(l) as ^ ^ oo, (2.9) 

where Xj = >irj|j. In the next section, we show that the limiting average overshoots xi, . . . , completely 
determine the (optimal) mixing distribution of the nearly minimax mixture rule. 

If Po{T\ < oo) = a, where q is a predefined number (0 < a < 1), then (2.5) and (2.9) imply that 

E,[TX] = I legal + log(<^^ + o(l) as q ^ 0. (2.10) 

Due to (1.1), this is the optimal asymptotic performance under Pj up to an o(l) term. Therefore, asymptotic 
approximation (2.10) provides a benchmark for the performance of any stopping time under P,. 

In order to study the performance of Ta under Pj even if pi = 0, for every i = 1, . . . , K we define the 
index 

i* = arg max Ei[Z{\ = arg min lij (2.11) 

and we assume that it is unique. When pi > 0, this is obviously the case since i* = i. On the other 
hand, when pi = 0, i* represents the "active" index that is closest to i, in the sense of the Kullback-Leibler 
distance for the con^esponding distributions. Thus, assuming that i* is unique, we exclude the case that there 
are two or more active indexes that are "equidistant" from i when pi = 0. Then, for every i = 1, . . . ,K,\Ne. 
have the decomposition Z„ = ZJ^ + > where 

= logK. + log 1 + ^ ^ ^ I , n G N. (2.12) 

Based on this decomposition and the fact that when i* is unique the sequence {1^^*} is slowly changing, 
we are able to use nonlinear renewal theory and understand the asymptotic behavior of the mixture rule 
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T4. When Pi = and i* is not unique, this decomposition is not valid and this case has to be considered 
separately. We do not consider this case here, since this would break the flow of the presentation without 
adding any insight to our main points. Methods similar to those developed in Dragalin et al. (2000) and 
Tartakovsky et al. (2003) can be used for this purpose. 

Finally, in the case pi = 0, we will also need the following Cramer-type condition: 

Condition 1. For every j / i* with pj > there exists > such that gjijj) = 1 and g'ji'yj) < 00, 
where gj{t) = Ej[e^^^i~^l'^]. 

2.2. Modes of Asymptotic Optimality 

Ideally, we would like to find an optimal test Topt G Ca that minimizes the expected sample size inf ^gc^ Ej [T] 
for all i = 1, . . . ,K, where Cq = {T : Po{T < 00) < a}. Since this is an extremely difficult task (if at 
all possible), we would like to find a test To € Ca that attains infxgCc [T] at least asymptotically for 
alH = 1, . . . ,K. We distinguish between the following three notions of asymptotic optimality. We say 
that To minimizes inix^Ca to ^r^f-order if Ei[To] = iniT^Ca (1 + o(l)); to second-order if 
Ei[To] =infTgc. Ei[r]+0(l);andtof/i/rJ-order,if EjTo] =infr6c. EJT] +o(l), where 0(1) is asymp- 
totically bounded and o(l) an asymptotically vanishing term as a — 0. 

Since the one-sided SPRT T\ is exactly optimal under Pj, it follows from (2.10) that 

inf Ei[T] = y [\ legal + \og{5, e^")] + o{l) as a ^ 0. 

Using this fact along with Theorem 2.1, we will see that a mixture rule is second-order asymptotically 
optimal under every Pj G "P if and only if it assigns positive weights to all probability measures in the 
alternative hypothesis, that is > for every i = I, . . . , K. In other words, for every fully-supported 
mixture test Ta with Po(T^ < 00) = a, the expectation Ej[T^] has a bounded distance from inf^eCa Ei[T] 
as A — )• cx) for every i = 1, . . . , K. 

2.3. Asymptotic Performance 

The main result of this subsection is Theorem 2.1, which provides a high-order asymptotic approximation 
for Ej[r4] as ^ — 00. Its proof is based on Lemmas 2.1-2.4. In Lemma 2.1 we present the main properties 
of the sequence {Y^ }, in Lemma 2.2 we obtain sufficient conditions for Ta to have power 1 under Pj, and 
in Lemmas 2.3 and 2.4 we obtain asymptotic approximations for log Po{Ta < 00) and Ej[T^] in terms of 
the threshold A. 

Lemma 2.1. For every i, Pi(l^ 4, logpi*) = 1, and hence the sequence {Y^ } is slowly changing under 
Pj. Moreover, if either pi > or if pi = and Condition 1 is satisfied, then there exists yi* > such that 
the following asymptotic equality holds 

Pi( max \Y^* - logpi*| > x] = O (e~'^^*^) as x —?■ 00. (2.13) 

\0<fc<n / 

Proof. From (2.12) it follows directly that Y^* > log pi*. Moreover, by the strong law of large numbers, 

1 A-' 7^ — 7** n 

- log — T = > Ei[Z( -Z\\= /jj. - lij for every j / z*. 

n n n-s-oo 

Since la* < Uj (by the definition of i*), it follows that Pj(An/A5j* — ;> 0) = 1 for every j / i* with pj > 0, 
and consequently, Pi{Y^* — )• logpj*) = 1. As a result, {Y^ } satisfies (1.8) and (1.9). Thus, it is a slowly 
changing sequence under Pj. 
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To prove (2.13), suppose first that pi > 0. Then, i* = i and Yljy^i ^n/^n ^ Pj-martingale with mean 
K — 1. Thus, from (2.12) and Doob's submartingale inequality we obtain 



Pi ( max \Yl - logpil > = P,; ( 1 + max — ^ > 

\0<k<n^ ' J \ 0<k<nj^, Pi Al 



which implies that (2.13) holds with -)i = 1. 

Suppose now that pi = 0, in which case i* ^ i. Then, working as in (2.14) and using the following 
inclusion 

I max V A{/A^* > y| C 11 { max A{/A|* > 77^), 
which holds for every positive constant y, we obtain 



Pi ( max lYi* — log Pi* I > < Pj ( max > pi*i 

\0<k<J ^ ' / ~ \0<fc<n. A* ^ ^ 



K 
Ai" 

< Pi max — ^ > 




0<fc<n A^ K — 1 



5]P,( maxjZ^-4*]>x + e(i; 



where 0(1) is a term that is asymptotically bounded from above and from below as rc — )• 00. For every 
j 7^ i*, the process {Zi — Z^*}n>i is a Pj-random walk whose increments have mean Ei[Zl — Z'f] < 0, 
which is negative due to the definition of i*. Thus, by Condition 1, for every j 7^ i* with pj > there exists 
a positive constant 7^ > such that 

P, ( maxjZ^ - Z'[] >x + 6(1)) = O (e"^^-) , 

which implies that (2.13) is satisfied with 7^* = min{7j : j / 'i*,Pj > 0}. □ 

Lemma 2.2. If either pi > orpi = but li > la*, then Pi{TA <oo) = l y A> 1 and Pi{TA 00) = 
1 as A ^ 00. 

Proof. First of all, we observe that {Z,^ } is a Pj-random walk whose increments have mean Ej[Z{ ] = 
li — hi*- Due to the assumption of the lemma, the latter is positive, and therefore, Pi{Z'n — )■ 00) = 1 as 
n —7- 00. Since 

Ta = inf{n >!:<*+ > log A], (2.14) 

and, by Lemma 2.1, Pj(y^ i logpj*) = 1 we conclude that Ta terminates Pj-a.s. and that Pi{TA — 00) = 
1 as ^ — > 00. □ 

Lemma 2.3. For every A > 1, Ta is a test of level 1/A, i.e., Pq(Ta < 00) < 1/A. Moreover, if for every i 
such that Pi > the distribution of Z\ is non-arithmetic, then 

APq{Ta<oo)^ Pik as A-f 00. (2.15) 

i:pi>0 
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Proof. Define the probability measure P = "^^ivp >QPi ^i- ^^Pi > 0, then by Lemma 2.2, Pi{Tj\ < oo) = 1, 
and therefore, P{Ta < oo) = 1. Moreover, 



dP 



K 



An = ^Vi^n- 



Therefore, if E[-] denotes expectation with respect to P, change of measure Pq i— )■ P yields 



< 1, 



APq{Ta < oo) = AE[e-^^A] = E e-(^^A-i°g^) 
which proves the first assertion. Furthermore, from (2.17) and the definition of P we have 

APo{Ta < oo) = J] K E, [p-(^T^-iog^)l 

i:pi>0 



(2.16) 



(2.17) 



(2.18) 



If Pi > 0, then i* = i and we have the decomposition Z„ = + Y^, where {Z^} is a Pj-random walk 
with positive mean /j and {Y^} is a slowly changing sequence under P,. Therefore, if also the distribution 
of Z\ is non-arithmetic, then Zt^ — log A converges weakly as ^ — )• oo to ^i(-) under Pj (see Woodroofe 
(1982, Theorem 4.1)). Thus, recalling the definition of 6i in (2.6) and applying the Bounded Convergence 
Theorem, from (2.18) we obtain (2.15). This completes the proof. 

□ 

Lemma 2.4. Suppose that Z\ has a non-arithmetic distribution with a finite second moment under P j. If 
either pi > or li > la* and Condition I holds, then 



{li - lii*) Ei[TA] = log^ + Xjij. - \ogpi* + o(l) as A ^ oo. 



(2.19) 



Proof. Write Da* = h — la*. Since {Z^}n>i is a Pj-random walk whose increments have non-arithmetic 
distribution and positive mean Ej[Zi*] = Da*, asymptotic approximation (2.19) follows from Woodroofe's 
nonlinear renewal theorem (see Theorem 4.5 in Woodroofe (1982)), as long as the the following conditions 
are satisfied: 

(Al) {maxo<fc<„ \Yj^*_^_n - logpi.|}„>i is a uniformly integrable sequence; 
(A2) Y.n=o Pid^f - log Pi* I < -ne) < oo for some e G (0, Da,); 
(A3) {y^ — logpi*}n>i converges in distribution; 

(A4) Pi{TA < Na) = o{1/Na) as ^ ^ oo for some e > 0, where Na = [{e log^)/Ai*J- 

Condition (Al) is satisfied because sup„ |y^* — logpi* \ is P^ -integrable. Indeed, from (2.13), which holds 
if either pi > or Condition 1 holds (see Lemma 2. 1), we have 



lim / Pj ( max (Yi — logj 

n Jq \0<k<n 



Pi*) > x] dx < oo. 



sup|y„* -log Pi 



Condition (A2) is clearly satisfied, since Y^* > log pi* for every n, whereas condition (A3) is also satisfied, 
since {Y^* — logpj*} converges to Pj-a.s. 

In order to verify (A4), we start with the following inclusion, which holds for every n G N and x > 0, 



max Zfc > X 

0<k<n 



max Zf > x/2}[ J 

0<fc<n " ' I 



max Y^ > x/2 

0<k<n 
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and which impUes that 

Pi{TA < Na) <Pi( max Zk > log A] 

\0<k<NA J 

Therefore, it suffices to show that both 



, max Zi' > log ^/A ] + P j ( max YI* > loo 

\0<k<NA - ° J \0<k<NA " - ^ 



«g V v4 . 



ireiore, ii sumces lo snow iiiai uoui terms on the right-hand side are of order o(l/ log A) as A 
Consider the second term. If pi > 0, then i* = i and, by (2. 14), 



■ oo. 



Pi i max \Yl - logpjl > log | < 

\0<A;<A'-A / 

Now, if Pi = and Condition 1 is satisfied, then by (2.13) 



Pi{VA-l) 



( max iVf - \ogPi*\ > logVI ] = OiA''^^*^'^) as A 

\0<k<NA J 



OO. 



Finally, consider the first term. We have 



, max zi* > log VI ) = Pi I max (zi' - Dh^Na] > ^ lo; 



= Pi 
< Pi 



I max (zi: -Du^Na) >^^^Dii*NA\ 
I max (zi' -Dii^k) > TiV^ j 



for some 7 > 0. Write Sk = Z\ — Da* k and o"^ = Ej5^ (which is finite by the conditions of lemma). Note 
that {Sk]k>i is a zero-mean Pj-maitingale, so that {5'^}fc>i is a submaitingale with respect to Pj. Apply 
Doob's maximal submartingale inequality, we obtain 



ing 



P,; ( max \Sk\ > iNa] < ^ 



{iNaY 



•^^A^i^ max Sfc>7A^A} 

0<fe<JV4 



^Na 



"T^ -"■{ max Sk>-yNA} 

Na I ^o<k<NA 



First, it follows that 
Now, we show that 



Pi ( max \Sn\ > -/Na] < ^ 



' c-2 

^Na 



^"^Na Na~>oo 



> 0. 



-TT- 1{ max Sk>'yNA} 
Na I 0<k<NA 



as long as EiSf = o"^ < 00, which implies that 



Na^oo 



>0, 



Pi max ISJ > jNa = o(1/Na) as A 00, 

\ 0<k<NA ' 
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i.e., the desired result. By the Central Limit Theorem, S^j^/ {N^a"^) converges as A — )• oo in distribution to 
a standard chi-squared random variable with one degree of freedom, Hence, for any L < cx) we have 



{ max Sk^yNA} 



AT ^\ max 



LA 



Na 



1{ max Sk>^NA} 
0<k<NA 



^Na ^^^Na 



Na 



Na 



1 



{ max Sk>7^A} 
0<k<NA 



< 



LP, ( max Sk >7^a) I 



^Na ^ ^ ^Na 



Na 



Na 



< 



La' 

e^Na 



+ -Ei \ LA 



S' 



Na 



Na 



> - Ei (L A X ) > o- (1 - 1) = 0. 

Na~^oo L—^oo 



The proof is complete. 



□ 



Now everything is prepai^ed to obtain an asymptotic approximation for the expected sample size up to 
the negligible term o(l). 

Theorem 2.1. Suppose that Z\ has a non-arithmetic distribution with a finite second moment under Pj and 
that either pi > or pi = and !{ > la* and Condition I holds. Then 

(li - hi,) Ei[TA] = I log Po{Ta < oo)| + log( k 5i) + - \ogp,, + o(l) as A^(X). (2.20) 

j:pj>0 



Proof. Using (2.15), we obtain 



log A = I log Pq{Ta < oo)| + log( Pi 5i) + o(l). 



We can then obtain (2.20) combining (2.19) and (2.21). 



(2.21) 



□ 



Remark 2.1. If the desired error probability Pq{Ta < oo) = q is fixed in advance, usually it is not possible 
to choose the threshold A = Aa so that Ta is a test of size a, i.e., so that Po(7a < oo) is exactly equal to 
a. Nevertheless, if A = Pi then from (2.15) and (2.20) we have 

Po(rA <oo) = a(l + o(l)), 

{h - hi*) Ei[TA] = I log a\ + log( Pi 5^ + Ki\i, - \ogpi* + o(l) as a ^ 0. (2.22) 

The following coroUary speciaUzes Theorem 2. 1 in the case that pi > 0. 

Corollary 2.1. Suppose that pi > and that Z\ has a non-arithmetic distribution with a finite second 
m.om.ent under Pj. IfPQ[TA < oo) = a, then 

h Ei[TA] = I loga| + log(^ ^ Pi + - logpi + o(l) as a ^ (2.23) 

i:pi>0 

and Ta is second-order asymptotically optimal under Pi, that is, 



Ei[TA]= inf Ei[r] + 0(1) as a 0. 

T^Ca 



(2.24) 
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This corollary implies that the performance loss of a mixture rule is bounded as A — )• oo under every 
Pi G V, as long as pj > for every i = 1, . . . ,K. However, when the number of "active" components in 
the mixing distribution, K = 4^{pi : > 0}, is very large, only first-order asymptotic optimality can be 
attained. This is the content of the following corollary of Theorem 2.1. 

Corollary 2.2. Suppose that pi > and that Z\ has a non-arithmetic distribution with a finite second 
moment under Pj. 7/'Po(T^ < oo) = a and K ^ oo so that \ogK = o(| loga|), then Ta is first-order 
asymptotically optimal under Pj, i.e., Ej[r^] = infyeCa Ei[r] (1 + o(l)) a — )■ 0. 

2.4. A Nearly Minimax Discrete Mixture Rule 

The proof of minimaxity is constructed based on an auxiliary Bayesian approach. The method is ideologi- 
cally similar to that used by Lorden (1977) and goes back to the proof of optimality of Wald's SPRT given 
by Wald and Wolfowitz (1948). 

More specifically, consider the following Bayesian problem denoted by B{'k^ {Pi}, c). Let vr G (0, 1) be 
the prior probability of the null hypothesis Hq : P = Pq, and assume that the losses associated with stopping 
at time T are 1 if T < oo and the hypothesis Hq is true and (c • /j) x T if Pj is the true probability measure, 
where c > is a fixed constant. Therefore, the cost of every observation under Pj is proportional to the 
difficulty of discriminating between Fi and Fq measured by the Kullback-Leibler divergence /j. Since the 
prior probability of the alternative hypothesis Hi : P G P = (1 ~ ^f=iPi = (1 ~ '''")' the 

Bayes (integrated) risk associated with an arbitrary stopping time T is 

K 

= ^Po(r<oo) + c(i-7r) /iE,[r]. (2.25) 

Moreover, for any positive constant Q such that Qc < vr, we consider the mixture rule Taq^, where 

Ao.= fi^l/fi^). (2.26) 



Qc J I \ vr 

ve. a natural Bayesian ir 

7rPo + (l-7r) pp. Then 



These stopping times have a natural Bayesian interpretation. Indeed, write P*' = X^^^ Pi Pj and P" 



P-(.|Ho)=7rPo(-), P-(-|H) = (l-^)P^'(-), 
and the posterior probability of the hypothesis Hq takes the form 

n„ = P'^(Ho|^n) = - ^— -, nGN. 

1 + ^ A„ 

Thus, Taq^ is the first time that the posterior probability of the null hypothesis becomes smaller than Qc, 
that is, 

Taq, = inf{n > 1 : A„ > Aq^] = inf{n > 1 : n„ < Qc}. (2.27) 

Solution of 6(7r, {pi],c) requires minimization of the expected loss (2.25). In the following lemma we 
establish Bayesian optimality of the mixture test Taq^ in the problem B{'k, {pj}, c) for sufficiently small c. 

Lemma 2.5. For any given vr G (0, 1) and Q > 1/e, there exists c* such that 

TZc(TaqJ = iniTZciT) for every c < vrc*, 

where infimum is taken over all stopping times. 
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The proof of Lemma 2.5 is methodologically similai^ to the proof of Lemma 2 in PoUak (1978) (see also 
Lorden (1967)) and is presented in the Appendix. This lemma provides the basis for the following important 
theorem, which shows that a particular mixing distribution leads to a mixture rule that is almost minimax in 
the sense of minimizing the Kullback-Leibler information in the worst-case scenario up to an o(l) term. 

Theorem 2.2. Let Ca = {T : Po(T < oo) < a} be the class of stopping times whose "error probabilities" 
are at most a, < a < 1. Suppose that Ej|Zip < oo and that Z\ is Pi-non-arithmetic. Then 



K 

inf _max h Ei[T] > | log a\ + log( J] 5i e^A + o(l) as a 0, (2.28) 



i=l 



and this asymptotic lower bound is attained by the mixture rule Ta = Ta{p^) defined in (2.1) whose mixing 
distribution is 

p'i = , i = l,...,K (2.29) 

and whose error probability is exactly equal to a, i.e., the threshold A = is selected in such a way that 
Po(T4(p°) < oo) = a. 

Proof. Let {pi} be an arbitrary mixing distribution, ir = 1/2, Q > 1/e and choose c < 1/2Q so that 
Po{Taq^ < oo) = a (recall the definition of Aqc in (2.26)). Then from (A.2) in the appendix it follows that 
a <2Q c and from the definition of TZc we obtain the following inequality: 

(y c 

- + - inf max /,E,[r]> inf n^T). (2.30) 

2 2 TeCa i=l,...,K T£Cc 

By Lemma 2.5, there exists c* < 1/Q such that for every c < c*/2 (and consequently for every a < Qc*): 

K 



inf 7^e(^) = n,{TAQ^) = ^ + ^ Y^p,i, e,[TaJ. (2.31) 

1=1 



Consequently, from (2.30) and (2.31) it follows that 

K 

r™| i=™i^ -^Pi^i ^^TaqJ- (2.32) 



1=1 

It remains to show that if {pi} is chosen according to (2.29), then 

K K 

Y,P^I■MTAQ^ = \\oga\+\og{^5^e''^^+o{l) as a ^ 0. (2.33) 

i=l i=l 

Substituting the mixing distribution (2.29) in (2.23), we obtain that, as a — )• 0, 

K 

IiEi[TAQ^ = \\oga\+\og{^5ie^^)+o{l), i = l,...,K, (2.34) 

1=1 

which implies (2.28). Since by construction Pq (T^q^ < oo) = a, it also follows from (2.34) that 

K 

max i, Ei[TA] = I log a\ + log( 6i e^») + o(l) as a ^ 

i=l 

whenever A = A^ is chosen so that Po(7a < oo) = o- The proof is complete. □ 
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Therefore, Theorem 2.2 imphes that if the threshold A = is selected so that PqI^a < oo) = a 
and the mixing distribution p = p'^ is given by (2.29), then the test Ta{p^) is third-order asymptotically 
minimax, i.e., as a — 0, 

inf max (/, EJT]) = max (/, E,[r^(pO)]) + o(l) 

and 

K 

max {IiEi[TA{p°)]) = | loga| + logfV 5^ e^') + o(l). 

l<i<K \^ — ' / 
1=1 

2.5. Asymptotic Minimax Performance of Mixture Rules 

The minimax performance loss of an arbitrary mixture rule Ta = Ta{p) with mixing prior p = {pi} and 
enw probability Pq{Ta < oo) = a can be naturally defined as follows: 

Ca{TA{p)) = max (/, Ei[T^]) - inf max (/^ Ei[T]). (2.35) 

l<i<K TeCa l<i<K 

Corollary 2.1 implies that if Ta{p) gives positive weights to all of its components, i.e., pi > for every 
1 <i < K, then: 

K 

h E, [Ta] = I log a\ + log Pj 5^) + - logp^ + o(l) , l<i<K, 
and consequently, 

K 

max (/, E,[Ta]) = I log a\ + log [(Vpj 5j) ( max (e'Vpi))! + o(l). (2.36) 

j = l 

Therefore, based on (2.28) and (2.36), for relatively small a we can approximate the performance loss (2.35) 
of an ai^bitraiy mixture rule Ta{p) with mixing distribution p = {pi] as follows: 

K K 



C{p) = \og[{^p,6,) ( max^(e-Vp,))] -log[5]e-^ 5, 

i=i i=i 
^ K \{ N ^2.37) 

Y.j=iPj^j] ( maxi<i<ii-(e'*7P; 



log 



where >C(p) = limQ_5.o Ca{TA{p)) is the limiting (asymptotic) loss. 

Cleai^ly, C{p) > C{p^) = for any p = {pi], where p^ = {p^} is the "optimal" mixing distribution 
defined in (2.29). Along with the uniform mixing distribution = {p"}, p" = 1/K for every 1 < i < 
K, which would be perhaps the first choice for practical implementation, consider the following mixing 
distributions: 

r,KL ^ h 1/5^ eV^^ f!V^^ l<i<K (2 38) 

T.j=ilj Ei=i(V<5i) Ei=i(e'^V<5i) 

which resemble p^ in that they all give more weight to those members of V that are further from Pp. Notice 
also that in the completely symmetric case that the Pj-distribution of A*^ does not depend on i, these mixing 
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distributions reduce to uniform mixing p". Using (2.37), we obtain 



E,=i ^jlj maxi<i<A'(e^V/j) K [ maxi<j<i^((5i e^^ 

/:(p'^^) = log^^ C^V . C{p'/') = log- 



= log ^ J^^ ^ ■ ^ , Z*^) = log ^ 



2.6. An Inefficient Minimax Mixture Rule 

We close this section by explaining why we chose to work with a "modified" minimax criterion, instead of 
inf^gCa niaxj Ej[T], which at first glance would be a more natural choice. The reason is that if we wanted 
to design a mixture rule Ta that would optimize the latter criterion (at least asymptotically), Ta should be 
an equalizer at least up to a first order, i.e. Ej[TA]/Ej[T^] should be approaching 1 as ^ — oo for any 
1 < ^ 7^ J < -f^- However, assuming that Li > la* and that Condition 1 holds for every i = 1,. . . ,K, 
Theorem 2.1 implies that 

(Ii-I,,.)E,[TA] = |loga|(l + o(l)), i = l,...,K, (2.39) 

where a = Po{Ta < oo). Thus, a necessary condition for a mixture rule to attain infrgc^ maxj Ej[T] 
asymptotically is that 

li - In. = Ij - Ijj*, l<i^j<K. (2.40) 

But this condition is not satisfied in general by a non-trivial mixture stopping rule that gives a positive weight 
to all of its components. Indeed, if > for every i, (2.40) holds only in the completely symmetric case 
that /i = . . . = //^. In general, this condition is satisfied by any mixture rule for which 

Pi > <^ li = mm[Ij — Ijj*]. 

However, such a minimax mixture rule can be very inefficient — it is not even uniformly first-order 
asymptotically optimal unless we are dealing with the symmetric case. Consider, for example, the slippage 
problem with K populations and suppose that only one population can be out of control and that h <^ I2 = 
• • • = Ik- Then, if we wanted to attain infyeCa maxj Ej[r], even asymptotically, we should use the one- 
sided SPRT T\, which is optimal under Pi, but ignores all other states of the alternative hypothesis. This is 
clearly not a meaningful answer and shows that the seemingly natural minimax criterion inf ygc^ maxj Ej [T] 
is not appropriate. 

3. CONTINUOUS MIXTURE RULES FOR AN EXPONENTIAL FAMILY 
3.1. Notation and Assumptions 

In this section we assume that V = {Pelage, where G C is a finite interval bounded away from and 
that the Pg-distribution of Xi, Fg, is defined by (1.2). Recall the definition of the likelihood ratio Af^ in (1.3) 
and write 

n 
k=l 

Observe that E£i[S'f] = Eg[6Xi — ipg] = OipQ — ipe = h, where Ig is the KuUback-Leibler divergence of Fg 
and Fq. For every G 0, we define the corresponding one-sided SPRT and overshoot 

= inf {n > 1 : 5^ > log a} , r/^ = 5^^ - log A on {Tj < 00}. 
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For every 6,9 e @ such that Eq[9Xi - ipg] = 4^'- - i/je > 0, we set 

^e\e = J^ e"'^?^g|g((ix), ^e\e = xng^g{dx), (3.1) 

where T-Lg^g is the asymptotic distribution of r]^ under P^, i.e., %g^g{x) = \\mA->oo^ gii^A — 
brevity's sake, we write He = T~ig\0, kq = >cq\q, and = 6g\0. 

From (2.10) it follows that if a = Po{Tj^ < oo), then the optimal asymptotic performance under Pg is 

le inf Ee[T] = h Ee[rl] = | log a\ + log(5e e"") + o(l) as a ^ 0. (3.2) 

Recall that in the continuous parameter case the mixture test Ta is defined by (1.4) with the average 
likelihood ratio process A„ given by (1.5). Below we assume that mixing distribution G{6) has continuous 
density g{6) with respect to the Lebesgue measure, in which case 



K= f exp{5^} g{e) de , n£ N. 
Je 



3.2. Asymptotic Performance of Continuous Mixture Rules 

The following lemma provides a higher-order asymptotic approximation for the expected sample size Eg [Ta] 
for large threshold values. 

Lemma 3.1. If g is a positive and continuous mixing density on and Pq{Ta < oo) = a, then for every 

le Be [Ta] = | log a| + log v^b^ - ^^^"^^^""^ 

/e^o aM" lift r \ 
+ log( / 5Bg{9)d9]+o{l) asa^Q. 

Proof From PoUak and Siegmund (1975) and Woodroofe (1982), p. 68, it follows that for every 9 G e\{0} 

h U[Ta] = log^ + log - ^ ^ + log(^"''^^^^) + ^(1) asA^oo. (3.4) 

Moreover, from Corollary 1 in Woodroofe (1982), p. 67 (see also PoUak (1986)) it follows that 

ylPo(T^<oo)^ [_degi9)d9, 
Je 

and consequently, 

log A = \log a\ + log(^ J^6e g{9) d9^ +o{l). (3.5) 

We can now complete the proof by substituting (3.5) into (3.4). □ 

Asymptotic approximations (3.2) and (3.3) imply that any continuous mixture rule with positive and 
continuous density on minimizes the expected sample size to first-order for every G G, i.e., 

Ee[TA] = inf Ee[T] (1 + o(l)) as q ^ for all 9 € 0. 

However, such a continuous mixture rule is not second-order asymptotically optimal for any 9 £ Q. More 
specifically, the following asymptotic equality holds 



Ee[TA] - inf Ee[T] = O log(V|loga|) for all 9 G G. 
Tec 

In other words, the distance between Ee[ryi] and the optimal asymptotic performance (3.2) under Pq does 
not remain bounded as a — for any 9 £ Q. 



16 



3.3. A Nearly Minimax Continuous Mixture Rule 



In the following theorem we show that a paiticular continuous mixture rule is third-order asymptotically 
minimax in the sense of minimizing the maximal KuUback-Leibler information supg IqEqIT] in the class 
Cq, as a — 7- 0. 

Theorem 3.1. If the limiting average overshoot kq is a continuous function on G, then 

1 + log(27r) 



inf sup 1q E6i[T] > I log a I + log \l\ log a\ 



e 



(3.6) 



+ log / 5e e^" \Hj'l)/Ie de]+o{l) as a ^ 0, 



and this asymptotic lower bound is attained by the continuous mixture rule T^ig ) whose mixing density is 



e 



g^{e) = —- — ^ J"''" — , (3.7) 

and for which PoC^aIs'^) < c«) = a- 

Proof. Lower bound (3.6) can be established following the same steps as in the proof of Theorem 2.2. The 
details are omitted. 

In order to show that the mixture rule TA{g^) with mixing density (3.7) attains the asymptotic lower 
bound in (3.6), it suffices to substitute (3.7) into (3.3) to obtain that for every ^ G 

le EelTA] = \ log a\ + log(vlbi^) - ^ ^ ^ 

(3.8) 



+ log(^ J_ 6e e^" ^J^'^/Ie dOj + o(l) as a ^ 0. 

This completes the proof. □ 

Remark 3.1. Note that for (3.4) and (3.8) to hold, the mixing density (3.7) must be continuous, which 
requires that >cg must be a continuous function, since Tpg and Iq = Oip'g — ijjQ are continuous. This is true at 
least when the distribution of Sf is continuous. 

Typically, the computation of the optimal mixing density (3.7) requires discretization. An example 
where such a discretization is not necessary is that of an exponential distribution. More specifically, suppose 
that dFQ{x) = e~^dx and dFg{x) = e~^^~^^^dx for every < 6* < 1. Then ipg = -log(l - 9), 
Iq = 9 / {1 — 9) + log(l — 9) and the exact distribution of the overshoot rj^ is exponential with rate (1 — 6) /9 
for every A > 1. Therefore, Hg is an exponential distribution with rate (1 — 6)/6, which implies that 
xg = 9/{l — 9) and Sg = 9. As a result, mixing density (3.7) is completely specified up to the normalizing 
constant 

e- ,f^g d9 = [ ^^Pi^/(1 - d9, (3.9) 

which can be computed numerically. 

Unfortunately, xg and 5g do not have analogous closed-form expressions in terms of 9 in general. There- 
fore, it is typically difficult to compute optimal mixing density g^. Thus, in practice it may be more con- 
venient to choose mixing density g from the class of probability density functions on the whole parameter 
space © that are conjugate to fg, so that the resulting mixture rule is easily computable. However, such a 
mixture rule will only be second-order asymptotically minimax over ©, as it was shown by PoUak (1978). 

In the following subsection, we consider another alternative to the nearly minimax continuous mixture 
rule; we approximate 6> with a discrete set of points and we use the corresponding nearly minimax discrete 
mixture test. 
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3.4. A Discrete Approximation 



A practical alternative to the optimal continuous mixture rule is to approximate the interval G by a genuinely 
discrete set, Qk = {Gi, ■ ■ ■ ,0k} C 0. In this case, the discrete mixture likelihood ratio statistic takes the 
form 

K ^ K ( n ^ 



n G N, 



1=1 i=l \m=l ) 

and, according to Theorem 2.2, the optimal mixing distribution {p^} is given by (2.29). By Corollary 2.1, 
such a discrete mixture rule is second-order asymptotically optimal under Pg^ for every z = 1, . . . , K, that 
is. Eg. [T^] = infrgCc ^ej^] + 0(1) for every i = 1, . . . , A'. Moreover, it is asymptotically third-order 
minimax with respect to the KuUback-Leibler information, i.e., 

max (/e Ee [Ta]) = inf max {Iq Eg [T]) + o(1). 
l<i<K ' ' TeCcl<i<K ' ' 

However, it is not even first-order asymptotically optimal under when 9 ^ Qk- More specifically, we 
have the following corollary of Theorem 2. 1 , for which we write Igg* for the Kullback-Leibler divergence 
of the distributions Fq and Fq*, that is, 

lee* = ^e[Sl - Sf] = E,[(^ - 6*) - {i^e - = - G*)^'e - ii^e - (3.10) 

Corollary 3.1. Suppose that 6 G Q\Qx and that there exists a unique 9* = argmine^geA- -^dSj- V'^d* < 
9*ipQ, then Pe{TA < oo) = 1. If also Po{Ta < oo) = a, then 

K 

[le - lee*] ^oITa] = I loga| + log(^^Pi (5^,^) + >ce\e* - logpe* + o(l) as a ^ 0. (3.11) 

i=l 

Proof. From Lemma 2.2 it follows that Pe{TA < oo) = 1 as long as Iq > lee*, or equivalently, 

9il^'g -i>e>{0- 9*) i^'e - {i^e i^e* < ^Ve- 

Moreover, since the random variable 9*Xi — tpe* has non-arithmetic distribution with exponential moments 
under Pe for almost every 9 (see Lemma 6.4 in Woodroofe (1982)), the conditions of Theorem 2.1 are 
satisfied, and consequently, we obtain (3.1 1). □ 

4. MONTE-CARLO SIMULATIONS 

In this section, we illustrate the asymptotic formulas obtained in Section 2 and check their validity with 
simulation experiments in the Gaussian example where Fq{x) = ^{x) and Fi{x) = <I>(x — i) for i = 1, 2, 3 
{^{x) = (27r)~^/^ e^* ^"^dt is the standard normal distribution function). Thus, the observations are 
normally distributed with unit variance and mean that is equal to under Hq and is either 1 or 2 or 3 under Hi 
(K = 3). In this example, the quantities Xj and 6i can be computed with any precision using the following 
expressions: 



n=l 
oo 



(4.1) 



'■ = tM-^T.-A--2^)}^ (4.2) 



n=l 
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(see, e.g., Woodroofe (1982), p. 32). 

In Table 1, we compute these quantities, the optimal mixing distribution (2.29), as well as the mixing 
distributions that we defined in (2.38). Using Table 1, we can compute the asymptotic performance loss 
(2.37) for each of the coiTcsponding mixture rules: 

C{p^^) = 0.21 , C{p^/^) = 0.58 , Cip"^/^) = 0.85 , = 1.21. 



Table 1. Mixing distributions and quantities Xj and 5i 



i 


li 










pKL 


pl/<5 


pU 


1 


0.5 


0.718 


0.560 


0.25 


0.066 


0.071 


0.176 


0.33 


2 


2 


1.747 


0.320 


0.125 


0.185 


0.286 


0.307 


0.33 


3 


4.5 


3.146 


0.190 


0.85 


0.749 


0.643 


0.517 


0.33 



In Remark 2. 1 we discussed that if we set A as 

A = i^i^lPl^^ (4.3) 
a 

where p = {pi} is the mixing distribution that defines Ta{p), the probability Pq{Ta{p) < oo) is expected 
to be approximately equal to a for sufficiently small values of a. In Table 2, we present the actual prob- 
abilities computed using Monte Carlo simulations. An importance sampling technique was used in these 
experiments, taking advantage of the representation Po{Ta_ < oo) = ^iPi^i[e~^^^] (see (2.18)). This 
allowed us to evaluate a very low error probability with a reasonable number of Monte Carlo runs. It is seen 
that the formula (4.3) ensures extremely high accuracy of the approximation of the desired error probability 
for all mixing distributions. 

Table 2. Probabihty P{Ta{p) < oo) for different mixing distributions: the first column represents the de- 
sired error probabilities; the other columns represent the actual error probabilities obtained by 
Monte Carlo simulations when the threshold is chosen according to (4.3) 



a 






pKL 


pl/5 


pU 


10- 


1 


5.9979 10' 


-2 


6.703710" 


-2 


8.033710" 


-2 


8.002910^2 


8.931410" 


-2 


10- 


2 


9.112710" 


-3 


9.431710" 


-3 


9.875410" 


~3 


9.8885 10-'^ 


1.004910" 


-2 


10- 


4 


1.010410" 


-4 


1.010710" 


-4 


1.002710" 


-4 


1.003810"^ 


1.001110" 


-4 


10- 


6 


1.001710" 


-6 


1.000610" 


-6 


1.000910" 


-6 


1.000410"^ 


1.0008 10" 


-6 


10- 


8 


1.000810" 


-8 


1.003310" 


-8 


1.000210" 


-8 


1.001710""* 


1.000610" 


-8 



Table 3 allows us to verify the accuracy of the asymptotic approximation (2.36) for the Kullback-Leibler 
information maxj(/jEj[T^(p)]) in the worst-case scenario for optimal mixing distribution p = p^ and uni- 
form mixing distribution p = p". For optimal mixing distribution p^, the asymptotic approximation (2.36) 
for maxj (/j Ei [T^] ) is very accurate for all studied probabilities of error a < 0.01. However, for uniform 
mixing distribution, the approximation (2.36) is considerably less accurate, but improves significantly as the 
error probabihty goes to 0. 
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Table 3. The maximal expected Kullback-Leibler information maxj(/iEj[r^(p'^)]) for optimal and uniform 
mixing distributions and p". The threshold A is selected according to (4.3). 



(a) Optimal mixing distribution (b) Uniform mixing distribution 



a 


Monte Carlo 


Approximation (2.36) 


a 


Monte Carlo 


Approximation (2.36) 


10-1 


4.99 


4.31 


10-1 


5.04 


5.52 


10-2 


6.36 


6.61 


10-2 


6.88 


7.82 


10-4 


10.99 


11.21 


10-4 


11.87 


12.42 


10-6 


15.65 


15.82 


10-6 


16.59 


17.03 


10-8 


20.33 


20.42 


10-8 


21.29 


21.63 



5. EXTENSIONS 

Despite the fact that one-sided tests have limited practical applications themselves, they can be used ef- 
fectively in the more realistic problems of testing two (or more) hypotheses and in changepoint detection 
problems. Indeed, multi-hypothesis sequential tests and changepoint detection procedures are typically built 
based on combinations of one-sided tests; see, e.g., Lorden (1971, 1977), Tartakovsky et al. (2003), and Tar- 
takovsky (1998). Therefore, the results of the present paper may have certain implications for these more 
practical problems, some of which we now briefly discuss. 

5.1. Two-Sided Mixture Sequential Tests 

Suppose that we want to stop as soon as possible not only under V but also under Pq and either reject Hq 
or accept it. Then, a sequential test is a pair (T, cLt) that consists of an {J^„}-stopping time T and an Ft- 
measurable random variable dx that takes values in {0, 1}, depending on whether the null or the alternative 
hypothesis is accepted. When V consists of a single probability measure, say V = {Pi}, the optimal test is 
Wald's two-sided SPRT 

T\b = mf{n > 1 : > A or < S}; 

I 1 if A^ ^ > A 
^^.s ] if Al„ <B 

V A,B 

where 0<i?<l<^are fixed thresholds. Indeed, as it was shown by Wald and Wolfowitz (1948), the 
SPRT attains both 

inf Eo[r] and inf Ei[T], 
where Po(dT^ b~'^^~'^' Pi{drpi^ ^ = 0) = /3 and 

Ci^p = {(r,d) : Po(dT = 1) < a and Pi{dT = 0) < 

When the alternative hypothesis consists of a discrete set of probability measures, V = {Pi, . . . , P^"}, a 
natural generalization of the SPRT is the two-sided mixture rule 

Ta,b = Ta.in{To{B),Ti{A)} , dr^^^ = 1{t^(a)<To{b)}, 
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where 

K K 

To{B) = infjn > 1 : ^ < s}, Ti{A) = infjn > 1 : J^p^Al,, > 

i=l i=l 

and {qi}, {pi} are mixing distributions. We conjecture that if {pi} is chosen according to (2.29), then 
{Ta,b, dr^ g ) is almost minimax, in the sense that it attains 

inf max (JiEjTl) 

(T,dT) &C^,p i=l,...,K 

up to an o(l) term as a| log /3| + /3| log a\ — > 0, where Po(dT^ ^ = 1) = q and Pi((iT^ ^ = 0) = /3 and 

= ((T, (ir) : Po{dT = I) < a and max Pi((iT = 0) < ^^j. 

However, this statement does not follow directly from our results in this paper. Moreover, it is not clear 
whether inf(7-^j,) a Eo[T] is attained up to an o(l) term for some particular choice of {qi]. This open 
problem will be addressed in the future. 



5.2. Sequential Changepoint Detection 

Suppose that a change occurs at an unknown time v so that the pre-change distribution of the sequence {X„} 
is Fq and the post-change distribution belongs to the set {Fi, . . . , We denote by PJ' the probability 
measure under which the change occurs at time v and the post-change distribution is Fj. If = oo (there 
is never a change), then Xn ~ Fq for every n G N, i.e., Pf = Pq. If = 1 (the change occurs at the very 
beginning), then ~ F-i for all n G N, i.e., P\ = Pj. The goal is to detect the change as soon as possible 
after it occurs, avoiding false alarms. Thus, a detection rule is a stopping time T, and one attempts to find 
such T that (T — zv)+ takes small values under every P^, but large values under Pq. 

Lorden (1971) showed that there is a close link between change detection rules and one-sided sequential 
tests. Based on this connection, he proved that applying repeatedly the one-sided SPRT, T\, leads to a 
detection rule (the so-called CUSUM procedure) that is asymptotically optimal in the sense that it attains to 
first order 

inf J,[T], (5.1) 

T:Eo[T]>A 

where Ji [T] is a minimax performance measure that quantifies the delay of the detection rule T when the 
post-change distribution is Fi. Using Lorden's method, it can be easily established that applying repeatedly 
a mixture-based sequential test Ta with pi > for alH = 1, . . . , K leads to a detection procedure that 
attains to first order (5.1) for every i = 1, . . . , K. However, the optimal choice of the mixing distribution 
remains an open problem that we plan to consider in the future. 



6. CONCLUSIONS AND FINAL REMARKS 

The main focus of this paper is on discrete, mixture-based stopping rules for testing a simple null hypothesis 
against a composite alternative hypothesis. These rules arise naturally in important practical problems, such 
as the multi-sample slippage problem, where the statistician has to decide whether one of the populations has 
"slipped to the right of the rest", without specifying which one. Discrete mixture rules are also useful when 
the alternative hypothesis is continuous, since they have certain important advantages over their continuous 
counterparts. More specifically, they asymptotically minimize the expected sample size within a constant 
(not only to first-order) at all parameter values used for their design (but they are asymptotically suboptimal 
outside of these points). However, the most important advantage of discrete mixtures is that they are easily 
implementable, which is not usually the case with continuous mixture rules. 
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The main contribution of this paper consists in finding an optimal mixing distribution both for discrete 
and continuous mixture rules. That is, for both cases, we find mixing distributions so that the resulting 
sequential tests are nearly minimax, in the sense that they minimize the maximal Kullback-Leibler informa- 
tion within a negligible term o(l). We believe that the methods of the present paper can be effectively used 
in the more practical problems of sequential testing two or more composite hypotheses and constructing 
nearly optimal mixture-based change-point detection procedures. 

APPENDIX: PROOF OF LEMMA 2.5 

We need to find a c* such that Tlc{T) > TIc{TaqJ for every stopping time T and for every c smaller than 
TTc*, or equivalently, for every c that satisfies the inequality Qc < ttQc*. Since Aqc is defined so that 
Qc < IT, it is clear- that c* must be chosen so that Qc* < 1. 

RecalUng that vr = P'^(Ho) is the prior probability of the null hypothesis Hq as well as the definitions 
of the probability measure P'^ and the posterior process {n„}„>i, for any stopping time T, we have 

oo oo 

^Po(r<oo) = ^P-(r = n,0 = O) = ^E^[l{r=n}n„] = E-[nrl{T<oo}] (A.l) 

n=l n=l 

and 

K K 

c(l - vr) Y,P^ iI^ > c ( mill ^i) (1 - vr) E,[T] = c ( min /,) E-[r]. 

^ — ' l<i<K ^ — ' l<'i<K 
1=1 i=l 

Therefore, 

7^e(^) > E"[nTi{T<oo} + c ( mm /,)r]. 

From this inequality it is clear that without any loss of generality we can restrict ourselves to P'^-a.s. fi- 
nite stopping times. Since the process {n„}„>o is a bounded martingale with IIo = vr, we conclude that 
TZc{T) > E'^[Ut] = IT for every P'^-a.s. finite stopping time T. Hence, it suffices to find c* with Qc* < 1 
such that for every c < vr c* 

K 

v^>7^,(^^QJ = ^Po(T^Q, <oo)+c(l-^) (/iE,[r^J). 

i=l 

From (2.27) and (A.l) it follows that 

vr Po{Taq^ < oo) = E-pT^^J < Qc. (A2) 
Therefore, we must find c* with Qc* < 1 such that for every c < vrc* 

K K 

Qc + c(l - vr) Y^p, {h HTaq^) < vr ^ (1 - vr) J^p, (/, E,[r^^J) < - - Q. (A3) 
i=i 1=1 ^ 

However, from (2. 19) it follows that there exists a constant C > 0, which does not depend on i and A, such 
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that li Ej [T^] < log A + C for any mixture rule Tjy. Therefore, 



K 



(1 - vr) (/, UTaqS) < (1 - vr)[log^Q, + C] 



1=1 



(l-7r)[log( 



1 — Qc vr 
Qc 1 — vr 



+ C 



<(l-7r) 

TT 



<log(^j+(l-vr)log(34^ 



+ C 



TT 







(gc). 


+ 



(1 - vr) log 



1 -vr 



+ C. 



Since also Qc < vr, from the inequality supo<2;<i ( x\ log x| j < e ^ we have 



K 



i=l 



c c Qe 



(A.4) 



Hence, from (A. 3) and (A.4) it follows that it suffices to find c* with Qc* < 1 such that for c < vr c* 



vr vr Qe — 1 _i „ vr vr Qe — 1 



c c Qe 



+ e-^+C<--Q 
c 



c Qe 



>e-^ + Q + C 



c ^ Qe - I 



1 



vr 



Qe e-i + Q + C 



Thus, it suffices to set 



Qe - 1 



1 



Qc* < 



Qe e-i + Q + C 
Qe-l Q 



Qe e-^ + Q + C 



< 1. 



and this is a valid choice since 

The proof is complete. 
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